{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "56b08e7a-f016-4783-a4ff-1bc51bf5534b",
   "metadata": {},
   "source": [
    "## The ChatGPT format\n",
    "```python\n",
    "chatgpt_format = {\n",
    "    \"chatgpt\": [\n",
    "        [\n",
    "            {\"role\": \"system\", \"content\": \"You are a human.\"},\n",
    "            {\"role\": \"user\", \"content\": \"No I am not.\"},\n",
    "            {\"role\": \"assistant\", \"content\": \"I am a robot.\"},\n",
    "        ],\n",
    "    ]\n",
    "}\n",
    "```\n",
    "### Example: converting a dataset from Hugging Face to the ChatGPT format and uploading to Hugging Face repo"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "c27e49b3-9ce2-481f-9e55-f008abeadc3d",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{'text': '<HUMAN>: What is a panic attack?\\n<ASSISTANT>: Panic attacks come on suddenly and involve intense and often overwhelming fear. They’re accompanied by very challenging physical symptoms, like a racing heartbeat, shortness of breath, or nausea. Unexpected panic attacks occur without an obvious cause. Expected panic attacks are cued by external stressors, like phobias. Panic attacks can happen to anyone, but having more than one may be a sign of panic disorder, a mental health condition characterized by sudden and repeated panic attacks.'} 172\n"
     ]
    }
   ],
   "source": [
    "from datasets import load_dataset\n",
    "data = load_dataset(\"heliosbrahma/mental_health_chatbot_dataset\")[\"train\"]\n",
    "print(data[0], len(data))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "351d1a81-c1c8-4bb0-9b8c-e14b54ff73f6",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[{'role': 'system', 'content': 'You are a mental health assistant.'}, {'role': 'user', 'content': 'What is a panic attack?\\n'}, {'role': 'assistant', 'content': 'Panic attacks come on suddenly and involve intense and often overwhelming fear. They’re accompanied by very challenging physical symptoms, like a racing heartbeat, shortness of breath, or nausea. Unexpected panic attacks occur without an obvious cause. Expected panic attacks are cued by external stressors, like phobias. Panic attacks can happen to anyone, but having more than one may be a sign of panic disorder, a mental health condition characterized by sudden and repeated panic attacks.'}]\n"
     ]
    }
   ],
   "source": [
    "chatgpt_format = {\n",
    "    \"chatgpt\": []\n",
    "}\n",
    "\n",
    "SYSTEM_PROMPT = \"You are a mental health assistant.\"\n",
    "for d in data:\n",
    "    text = d[\"text\"]\n",
    "    assistant_word_i = text.find(\"<A\")\n",
    "    human_text = text[9:assistant_word_i]\n",
    "    assistant_text = text[assistant_word_i + len(\"<ASSISTANT>: \"):]\n",
    "\n",
    "    chatgpt_format[\"chatgpt\"].append(\n",
    "        [\n",
    "            {\"role\": \"system\", \"content\": SYSTEM_PROMPT},\n",
    "            {\"role\": \"user\", \"content\": human_text},\n",
    "            {\"role\": \"assistant\", \"content\": assistant_text}\n",
    "        ])\n",
    "\n",
    "print(chatgpt_format[\"chatgpt\"][0])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ab541655-c234-41ca-9f7a-2f20f1ad3c01",
   "metadata": {},
   "source": [
    "### Publish to Hugging Face repo"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "88fd3ef0-1bc4-42d6-af93-b670b64a44c9",
   "metadata": {},
   "outputs": [],
   "source": [
    "dataset = Dataset.from_dict(chatgpt_format)\n",
    "dataset.push_to_hub(\"<HUGGING_FACE_DATASET_REPO>\", token=\"<HUGGING_FACE_TOKEN>\") # Example of a dataset repo 'test/test_dataset'"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.18"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
